<html>
<link href="../enzo.css" rel="stylesheet" type="text/css">
  <head>
    <title>Data Handling Issues</title>
  </head>
<body> 
    <h1>Data Handling Issues</h1>

<p>There are significant issues concerning the handling of data output from Enzo.  
The datasets are, by nature, quite complex - 
each grid patch in the AMR hierarchy has its own file, and each file has anywhere between 5 and 25 datasets of 
both grid-based and particle data.  Large AMR calculations can have tens of thousands of files, and take up 
multiple gigabytes, though typical simulations performed up to the present day are much smaller.  
The largest simulation done to date with Enzo (from the standpoint of data analysis) is 
a unigrid calculation of the Lyman Alpha Forest done on a 1024^3 root grid and following multiple primordial 
species.  Each data hierarchy that was output took up over 100 gigabytes, and all of the output from the 
entire siulation took up over 5 terabytes!</p>

<p>There are two primary problems with datasets of this size and complexity.  First, many operating systems 
have serious problems dealing with a directory where there are thousands to hundreds of thousands of files - 
if 
you type <tt>'ls -l *.grid*'</tt>in a directory with on the order of a thousand files, 
<tt>ls</tt> will crash.  Most
 other unix commands, such as <tt>foreach</tt>, will also crash when confronted with such large numbers of 
files.  The net result of this is that it is important to have a trustworthy backup script which can take 
care of putting all of the simulation data from each output into a subdirectory and tar it up.  We have made 
one such script available <a href="example_backup_script.bat">here</a>.</p>

<p>A second main problem is archiving the data.  It's VERY difficult to move around huge quantities of data,
and it can take hours for a mass storage system to back up the largest simulations.  Long, painful experience
has shown that when backing up Enzo data to a tape backup system (such as NCSA's UniTree or SDSC's HPSS) it
is always better to back up a few very large files than thousands of tiny ones.  This speeds up retrieval of
the data and also reduces the chances of there being errors when putting data into the mass storage system.  Note
that this may or may not be an issue for you - some NPACI backup systems are rock solid (such as NCSA's UniTree)
 while others are not necessarily trustworthy (SDSC HPSS).  Regardless of the reliability of the mass storage
system, reduction in complexity of retrieval is always a good thing.  It may also be a wise idea to check
the error codes returned by the software controlling the mass storage system to ensure that the data has been
properly stored.
</p>

<p>
It is also important to note that some filesystems may have problems when Enzo attempts to write thousands of
files in a short period of time.  Though the code performs error checking to make sure that files are written,
hardware-level errors can possibly result in loss of data.  If you discover that there is a problem with files
'disappearing' this may be the problem - talk to your system administrators.
</p>


<p>&nbsp;</p>
<p>
<a href="machine_top.html">Previous - Machine Notes</a><br>
<a href="analysis_cook_top.html">Next - Data Analysis</a><br>
</p>

<p>&nbsp;</p>
<p>
<a href="../index.html">Go to the Enzo home page</a>
</p>
<hr WIDTH="100%">
<center>&copy; 2004 &nbsp; <a href="http://cosmos.ucsd.edu">Laboratory for Computational Astrophysics</a><br></center>
<center>last modified February 2004<br>
by <a href="mailto:bwoshea (AT) lanl.gov">B.W. O'Shea</a></center>



</body>
</html>
